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Abstract 

In agile software development, test code can consider- 
ably contribute to the overall source code size. Being a 
valuable asset both in terms of verification and documen- 
tation, the composition of a test suite needs to be well un- 
derstood in order to identify opportunities as well as weak- 
nesses for further evolution. In this paper, we argue that the 
visualization of structural characteristics is a viable means 
to support the exploration of test suites. Thanks to general 
agreement on a limited set of key test design principles, such 
visualizations are relatively easy to interpret. In particular, 
we present visualizations that support testers in (i) locat- 
ing test cases; ( ii) examining the relation between test code 
and production code; and ( Hi) studying the composition of 
and dependencies within test cases. By means of two case 
studies, we demonstrate how visual patterns help to iden- 
tify key test suite characteristics. This approach forms the 
first step in assisting a developer to build up understanding 
about test suites beyond code reading. 



1 Introduction 

Pushed by the adoption of agile development methodolo- 
gies as well as the availability of free testing frameworks, a 
lot of unit tests have been written over the last few years. 
Such tests are specified persistently, thereby contributing to 
the size of a software project's artifacts. Studies report a 
ratio of test to production code which can extend till 2:3; 
occasionally even 1 : 1 |T0", ^231 . As such, unit testing con- 
siderably impacts a software project's development cost. 

The benefits of unit testing are well known. In the short 
term, the application of unit testing results in software of 
higher quality ||2T] [191 0. Unit testing is observed to find 
other defects ||22l [TTl and is also reported to be consider- 
ably cheaper than strategies relying solely on testing later 
in the development cycle ifTTIl . In the long term, unit tests 
are a valuable asset during regression testing, able to notice 
undesired side effects of changes | 9, Ch.6]. 

On the down side, unit test code needs to co-evolve with 
production code in order to remain useful. Moreover, a test 



suite is subject to the problem of design erosion as well, 
gradually loosing the initially intended design and thereby 
becoming harder to understand and modify. Constructs in 
the tests that hinder modification, e.g. complex test cases or 
a resource dependent test 1 10], directly affect developer pro- 
ductivity thereby amplifying the overall maintenance cost. 
Studies indicate that regression testing can account for as 
much as one-third of the total cost of a software system 1 16 |. 

Therefore, in the context of a legacy system, the asso- 
ciated test suite contains both opportunities, in the form of 
well designed, isolated unit tests with a high coverage, as 
well as weaknesses, in the form of maintenance intensive 
test cases, components lacking coverage, etc. Evaluating 
the overall condition first requires identifying the location 
of test code and relating it to the corresponding produc- 
tion code. A first notion of coverage per component can 
be obtained by comparing production and test code size- 
wise. Next, to explore the amount and kind of test cases 
for individual production components, a developer has to 
study their interdependencies. Detecting maintenance in- 
tensive test cases, finally, requires studying their internals: 
typically through code reviewing. 

Code reviewing, with known reviewing rates around 150 
to 200 lines of code per hour 14J, was quickly found not to 
be scalable and therefore research went to look for design 
recovery techniques at a higher level of abstraction |[5l[2Qll. 
We identified two ways in which general design recovery 
techniques can exploit the more constrained context of test 
code. First, in contrast to the heterogeneous design heuris- 
tics for production code, design guidelines for test code are 
quite strict, emphasizing recurrent design idioms such as 
the setup- stimulate- verify-tear down cycle (S-S-V-T). Sec- 
ondly, the abstractions typically used in program compre- 
hension - e.g. classes, methods, invocations etc. - lack test- 
ing semantics. Testers reason in terms of test cases, fixtures 
and assertions, suggesting a semantic layer on top of the 
abstractions employed in general design recovery. 

Accordingly, this work contributes to the general body of 
knowledge on software visualization by introducing a test 
suite representation that does exploit the more constrained 
context of test code, allowing developers to explore the 
composition of a test suite, navigating between correspond- 



ing production units and test cases, identifying co-evolution 
needs or spotting test design anti-patterns. 

This paper is structured as follows. In Section[2j we reca- 
pitulate desired unit test characteristics. We clarify how the 
use of the S-S-V-T cycle as well as unit testing frameworks 
assist in composing well- structured tests. The visualization 
technique introduced in Section [3] exploits software design 
elements in its abstraction. Next, we present three visual 
presentations and discuss their interpretation (Section]?]). In 
Section [5] we report about two case studies, the findings 
of which we validate by means of design documentation, 
reports as well as an interview with a developer. After dis- 
cussing related work (Section[6]) we wrap up (Sectionj?]). 

2 Test Suite Design 

In this section, we briefly introduce terminology, design 
guidelines and strategies that are commonly used during 
unit testing. 

Unit Testing Terminology - The standard unit testing 
terminology stems from Beck's pattern system |3|: 

• a Unit under Test is the set of production classes 
(classes that contribute to the final software product) 
that is exercised together during testing. In a strict unit 
testing approach a unit corresponds to a single class. 

• a Test Case groups a set of tests performed on the same 
unit under test. Within JUnit a test case is speci- 
fied as a class, inheriting from the generic TestCase 
class offered by the test framework. 

• a Test Case Fixture is the set of attributes a test case 
requires to bring the unit under test into the desired 
state. The fixture consists of an instance of the unit 
under test as well as some shared test data. 

• a Test Command is a container for a single test. It is 
typically encapsulated in a method of a test case. 

• the Test Case Setup is a method of the test case in 
which the fixture is initialized into the desired state for 
testing. A corresponding Test Case Tear Down method 
releases resources again. 

Design Guidelines - Unit test design guidelines propose 
a strict structure for specifying tests: (i) acquire and initial- 
ize the necessary resources, (ii) send one or more stimuli to 
the unit under test, (iii) verify that the unit responds prop- 
erly; and finally (iv) release the acquired resources. These 
four calls are referred to as the setup- stimulate- verify-tear 
down cycle (S-S-V-T). The first step, performed in a Test 
Case Setup, is repeated before every Test Command in the 

^ JUnit is the Java implementation of the xUnit family of testing 
frameworks, the de facto framework for unit testing today 



Test Case. Each Test Command stimulates and verifies the 
unit under test. 

Unit tests are typically specified in the same program- 
ming language as the system under test, yet are not tested 
extensively themselves. To support code reviewing as the 
main means for test quality assurance, test cases are re- 
quired to be concise, transparent in their objectives and 
isolated in implementing the S-S-V-T cycle, forming an en- 
capsulated test. Test Commands exercising the same unit 
under test are gathered in a Test Case, thereby sharing an 
explicit Test Case Fixture and Setup. 

Unit Testing Strategies - The testing plan of a typical 
software system entails many strategies: unit tests verify the 
functionality of small units at a time, integration tests are fo- 
cused on the interaction between components, system tests 
consider the behaviour of the system overall, etc. Despite 
the clearly distinctive objectives for each strategy, overlap 
between strategies occurs, such as unit tests bearing proper- 
ties of another testing strategy: 

• Units that are tightly coupled with many other units 
require more effort to isolate, e.g. by means of test 
stubs taking the place of external units. Therefore, a 
tester may decide to unit test without isolating - i.e. 
setting up a larger unit under test only part of which 
is the actual unit under test. Such a test case can be 
considered as being more integration testing. 

• Certain units are always used by the same other (set of) 
units. During testing that unit might also be exercised 
via this set for ease of setup or because it closer re- 
sembles the actual usage scenario, resulting in multiple 
units to be considered. This testing approach is called 
Indirect testing. Moonen and van Deursen argue that 
this makes understanding and debugging harder ifTOl . 

• When large data sets are required for testing certain 
units, this data is sometimes stored in files and loaded 
during test setup. The reading functionality is typi- 
cally abstracted and shared among test cases. Test code 
classes that implement this functionality are referred to 
as test helpers. 

• System- wide input/output testing approach can be fed 
with test input data specifically chosen to exercise par- 
ticular units. 

Considering this variation, determining the kind of unit tests 
present in a test suite is worthwile, as it may steer upcoming 
re-engineering tasks. 

3 Visualizing Test Suites 

In this work, we propose a visualization technique assist- 
ing re-engineers to analyze and comprehend the structure 
and quality of the test suite of large systems. To describe 
our visualization, we use the five-dimensional framework 
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of Maletic et al. ifTSl . This framework was proposed for 
development and maintenance of large-scale software and 
stimulates the user to describe how a technique assists in 
completing a particular software development task. 

3.1 Tasks 

The proposed visualization supports three program un- 
derstanding tasks that we call First Contact, Understand 
Unit(s) and A^-^^^-^- Test Case(s). Together, they form a top- 
down and phased approach to explore a test suite. 

Perform First Contact. In this task, a developer builds 
up an overall mental model of a system at a high level of 
abstraction |l9l. Initial understanding of the associated test 
suite is obtained by: (i) localizing the test suite code in 
the source tree, (ii) looking at overall coverage to get a no- 
tion on coverecj^as well as uncovered components; and (iii) 
studying test suite design to grasp the granularity of units 
that have been used. 

Understand Unit(s). Koenemann and Koch observed that 
programmers only study code in case they are convinced of 
the relevance in the context of a particular task 1 17 |. As unit 
tests also serve the purpose of live documentation, explain- 
ing in simple scenario's how a unit is (and is not) supposed 
to be used, information about relevant test cases forms a 
next step towards the actual modification of the code. Sec- 
ondly, coverage information can reveal important parts, as 
we assume that critical, frequently changing or core parts 
are tested more extensively. 

Assess Test Case(s). After having identified relevant unit 
tests, i.e. test cases directly invoking production units of in- 
terest, the next step consists of evaluating the internal struc- 
ture of individual test cases. Well-designed unit tests are 
most suited for documentation purposes, as they (i) are easy 
to understand; and (ii) specify how a particular unit is (and 
is not supposed to be) used in an isolated scenario. Test 
cases may present certain maintenance challenges as well. 
Weakly isolated test cases, requiring a complex setup or 
bearing unrealistic run-time expectations, become costly to 
maintain due to frequent changes, slow execution or seem- 
ingly random failures ifTOl . 

Summarizing, for three maintenance tasks we identified 
three information topics concerning the test suite: Test Lo- 
cation, Test Coverage and Test Design.Table[T]shows which 
information topics are required during the three tasks. For 
each of them we will introduce a separate view, i.e., a filter 
mechanism on the overall visualization technique. 



^In the context of this paper, we use a coarse grained definition for 
coverage: a class is covered by a test case when at least one of its methods 
is invoked by a test command 
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Table 1. How are test suite exploration topics 
covered by the presented views? 



3.2 Audience 

The visualization technique is intended to assist software 
engineers in exploring the test suite of an unfamiliar system, 
as required in the following typical scenarios: 

• A newcomer to the project who is asked to build up 
knowledge quickly and relatively independent from 
the existing team. 

• A team of developers assigned to a major modification 
of a stabilized, long running legacy system may use it 
to characterize the available tests, thereby serving as a 
reference for communication. 

• A re-engineer, asked to analyze a system X's opportu- 
nities and threats when considered to be integrated into 
system Y, will be interested in the test suite as well. 

3.3 Target 

The target defines the characteristics of the software sys- 
tem to be visualized |18|. In this work, we are interested 
in the structure of test suites as well as the relationship be- 
tween test suites and production code. In a first step, we 
fetch information from the system's source code using a 
static fact extractor. This results in a model of the system ac- 
cording to the formalism of the Object Oriented Framework 
for Coupling and Cohesion (OOFCC) specified by Briand 
et al. 1 6 1, i.e. a formalism to query and count in terms 
of classes, methods, invocations, etc. Secondly, we iden- 
tify entities that belong to test code and adapt the model 
representation according to the OOFCC refinement that we 
proposed for test code that (i) formalizes unit test concepts 
as entities and relationships in a test model, (ii) describes 
how OOFCC entities map onto test concepts for common 
implementations of xUnit and (iii) provides the heuristics 
(type, inheritance and ownership properties as well as nam- 
ing conventions) required to implement a refining model 
transformation ll26li . Due to space constrains, we do not re- 
produce the formalism here. Next, for each view we query 
this model and compose the input for a graph visualization 
tool. These steps are automated in a tool called Fetci^ 

The model entities of | 26 1 were already introduced (in- 
formally) in Section [2] (Unit Testing Terminology). We fur- 

^ stands for Fact Extraction Tool CHain, developed at the University of 
Antwerp. Available at [littp ://www.lore .ua. ac. be/Research/ Artefacts/| 
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thermore distinguish three types of relations: 

• Containment relations represent the hierarchical de- 
composition of a software system in containers. 
Classes belong to a parent package, methods belong 
to a class, etc. This decomposition applies to both pro- 
duction and test code. 

• A coverage relation is a relation between a test entity 
A and a production entity where A has at least one 
invocation towards B. As such, we only consider di- 
rect relations between production and test code, in line 
with a developer's information needs during the explo- 
ration of a test suite's composition, rather than obtain- 
ing finer-grained, exact coverage measures. 

• Test dependencies are relations between test entities, 
e.g., revealing certain abstractions or recurrent helper 
functions in the test design. In practice, such mech- 
anisms are typically implemented in an (abstract) test 
case superclass. As such, we denote an inheritance re- 
lation between two test cases as a test dependency. 

3.4 Representation & Medium 

Directed graphs have been shown to be natural represen- 
tations of software systems L15^j24 |. Table[2]explains which 
entities and relations we represent as nodes and edges. The 
symbols for class and test case vary per view. For the 
'medium' dimension, characterizing where the visualiza- 
tion is to be rendered (Tsl, we adopted Guess. Guess 
is an exploratory data analysis and visualization tool for 
graphs and networks 1 1 1, as a part of FETCH. This environ- 
ment assists the user in graph exploration through capabil- 
ities such as applying graph layouts, highlighting, zooming 
and moving as well as customizable filtering. 
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Table 2. Visualization Legend 



4 Three Test Suite Views 

In this section we present three views as filters on the 
overall graph representation introduced above. Each view 
corresponds to an exploration task. We expand upon the 
intent and motivation for each view, and the interpretation 
that should be given to any of the observed indicators. 



4.1 System- wide Test Suite View 

The core of this visualization is the hierarchical decom- 
position of a software system into packages and classes. 
The filter we apply on the graph therefore skips all enti- 
ties below the class level. To distinguish packages entities 
from class entities we use square and circle shapes respec- 
tively. The Graph EMbedder (GEM) algorithm applied on 
the containment edges provides both an easy to interpret as 
well as aesthetically pleasing layout 1 13 |. 

In line with the re-engineering pattern "Study the excep- 
tional entities" |9 |, entities exhibiting either a lack of, oppo- 
sitely, plenty of incoming and outgoing edges deserve spe- 
cial attention. In the former case, one has to look further for 
other testing strategies. In the latter case, an important role 
during testing can be assumed. 

4.1.1 Test Location 

Intent. Localize the system test code. 

Motivation. Typically, localization of test code is a config- 
uration management responsibility, as it is a consequence of 
source tree structure in the version control system. Feathers 
discusses pro and contras of possible test code localization: 
the whole test suite may be gathered in a common location, 
test cases may be stored per component, but may as well 
reside among production code lfT2l . 

Interpretation. A consistent location means that we iden- 
tified earlier project conventions that we can rely on. In case 
of no dominant visual indicators, we can deduce the absence 
of such conventions. The view still helps to determine the 
test code associated with a particular component. 

Visual Indicators. We demonstrate two kinds of locations 
deducible from the system- wide view (see Figure [T]). 

• Test cases located in the same package as produc- 
tion code will result in package nodes containing both 
white and black nodes. 

• We observe two packages, one filled with white nodes 
next to one with black nodes, connected by coverage 
edges when test code resides in another package than 
production code. 

4.1.2 Test Coverage 

Intent. Obtain a basic notion of test coverage. 

Motivation. The desired notion of test coverage is cheap 
to obtain and scalable. It helps to make assumptions about 
earlier test efforts and system-critical components. Ob- 
serving coverage for evolving components gives an impres- 
sion about the risks (and possible counteractions) for further 
modifications. 
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(a) Same Package (b) Different Package 

Figure 1. Possible locations of test cases 

Interpretation. Components that are not directly tested 
might be trivial (e.g. data holders), have been decided only 
to be tested in conjunction with other components or might 
not have been tested, e.g., due to time constraints. For 
stronger tested components we assume a more important 
role such as being critical, frequently changing, belonging 
to the system core, etc. Assumptions need to be verified 
further on in subsequent exploration tasks, eventually by 
obtaining actual, fine-grained coverage measures. 

Visual Indicators. Components not covered by unit tests 
will show as clusters of classes (i) without test cases in 
the same package; and (ii) without incoming test coverage 
edges (e.g.. Figure 2(a) ). The components in Figure [T] serve 
as examples of better covered components. Highly covered 



classes, such as class A in Figure |2(b)[ are represented as 
nodes receiving many test coverage edges. 



dealing with a more integration testing style, tests are more 
suited to give feedback about the overall status of a modified 
system. Such tests risk to be more complex and change sen- 
sitive, however, due to the many dependencies. Test helpers 
are typically used to abstract away recurring setup, stimu- 
late or verification behavior, but also to facilitate access to 
more complex test data and input/output tests. As reusable 
entities, test helpers help to avoid duplication in test code. 

Visual Indicators. Units tested in isolation are shown as 
package-clustered nodes receive a limited number of edges 



from similarly clustered test case nodes (Figure [3(a)] ). We 
identify indirect tests in case test cases do not cover the 
units expected from the identified test location, but rather 
(i) access units in other packages or (ii) multiple test cases 
target a single unit in a package as in Figure |3(b) A test 
case receiving many test dependency edges, such as A in Fi- 
gure [3(c)] serves, at least partially, as test helper and forms 



as such an opportunity during test suite extension. 




(a) An isolated (A) and a (b) Indirect Tests via (c) Test 
less isolated component (B) interface dency 

Figure 3. Test Design Indicators 



Depen- 




(a) Untested Components (b) Highly Covered Class 

Figure 2. Test Coverage Indicators 
4.1.3 Test Design 

Intent. Grasp the overall test suite design. 

Motivation. The test design reveals first hand information 
on what kind of testing strategies have been applied in the 
past. This tells us at which point such tests become most 
useful as well as how difficult tests will be to understand 
and modify. 

Interpretation. To grasp testing strategies, we mainly 
look at the size of a unit and the presence of test helpers. 
Units of a limited size (e.g. a single class) can play an im- 
port role as test harness for local changes, due to being eas- 
ier to understand and modify. When units are larger or when 



4.2 Unit under Test View 

This view focuses on an individual production class visu- 
alized in terms of its accessible methods. Test case invoking 
one or more methods of this unit are displayed as well, with 
coverage edges drawn from the test commands to the in- 
volved methods. To distinguish classes from their methods, 
these entities are drawn as squares and circles respectively. 

4.2.1 Test Location 

Intent. Identify test cases for a particular unit under test. 

Motivation. During evolution of component, a developer 
gathers the set of production classes as well as test cases 
that require modification. 

Interpretation. Depending on past test strategies, a unit 
can be exercised by one or more test cases. In case unit test- 
ing as well as integration testing are specified in test code, 
units are covered by test cases of each strategy. Test cases 
might also be split up when growing too large 1 10 |. 

Visual Indicators. Trivially, production classes that are 
not covered directly, and as such are not likely to be the 
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subject of a focused unit test, will show up in the view as 
untested units - components without black test nodes. Ev- 
ery involved test case will be represented in terms of its test 
commands. Coverage edges make the relationship with the 
unit explicit (see Figure [4]). 

4.2.2 Test Coverage 

Intent. Which parts of a unit are being tested? 

Motivation. Once the scope of interest for a certain main- 
tenance task has been reduced to a couple of units, the Unit 
under Test view presents how these units are covered by test 
commands of involved test cases. These test commands are 
worth exploring in detail, because of their documentation 
power as well as their co-evolution needs. 

Interpretation. Complementary to assumptions derived 
from studying the location of involved test cases, in the 
context of coverage assessment the focus lies on identifying 
combinations of production methods being exercised. This 
allows the developer to understand which methods are not 
directly tested, methods covered by means of simple sce- 
narios and methods tested together within a test command. 

Visual Indicators. An example of Multi-Test Case Cov- 
erage is shown in Figure |4j a unit receiving coverage edges 
from multiple test cases A, B and C. If only A would have 
existed, we encountered a unit with partial coverage, i.e. 
only a small part of its methods are directly covered. 




Figure 4. Multi-Test Coverage 

4.3 Test Case View 

The Test Case View centers around individual test cases, 
which are represented in terms of the S-S-V-T entities and 
the exercised production units. Again, classes are repre- 
sented as squares; methods as circles. In addition to the 
nodes introduced in the visualization technique, this view 
adds two meta-nodes named Fixture and Test Commands, 
thereby making these two test concepts explicit. 

4.3.1 Test Design 

Intent. Identify opportunities in the form of well designed 
test cases as well as possible maintenance threats by study- 
ing the internal structure of selected test cases. 



Motivation. Test cases that are designed according to strict 
unit test design guidelines, with explicit fixture and concise 
setup and test commands, are an opportunity to understand 
(i) typical usage of the unit under test as well as (ii) how 
the test suite covers such units. Integration- style test cases 
demonstrate how components interact. Using method-level 
information a developer can better motivate whether a cer- 
tain test case is a possible threat or rather a manifestation of 
a certain test strategy. 

Interpretation. Deviations from the design guidelines are 
potential maintenance threats. A list of complex and thus 
undesired test structures can be found in | lOJ . 

• Test cases can lack an explicitly defined fixture, 
thereby removing the distinction between the defined 
unit of interest and surrounding helper units. 

• A test command invoking many production methods, 
possibly from multiple production classes, entails a 
complex (integration) test scenario. 

• Test cases with a large fixture that is only partially used 
by individual test commands indicates that the con- 
tained test commands do not logically belong together, 
therefore violating the guidelines of encapsulation and 
transparency in test objective. 

Visual Indicators. Figure 5(a) shows an example of a well 
designed test case. It contains an isolated and explicit fix- 
ture, the methods of which are consistently tested by single 
test commands. The Lack of Explicit Fixture (Figure 5(b)| ) 
renders a test case more difficult to understand, as the com- 
mon unit under test (if at all present) is implicitly interwo- 
ven within every single test command. Making the fixture 
explicit implies introducing a test case attribute that is ini- 
tialized by the Test Case Setup. A Complex Test Scenario 
in a test command can be recognized by the many coverage 
edges that target production methods (e.g. test command 
A in Figure 5(c)| ). Moreover, the Large Fixture of this test 
case is diagnosed by the many production class entities in 
the fixture that are extensively, yet not fully, shared among 
the test commands. Within a unit test suite, we identify the 
more Integration Test type of test case (e.g.. Figure [5(d)| ) by 
the multiple production classes that are accessed without a 
dominant unit under test. 

5 Case Studies 

In this section we report on two case studies to evaluate 
our visualization technique. We used FETCH to statically 
extract the required information and compose the graphs. 
As a first project, we selected the open source build system 
Apache Ant, a middle-sized, industry-strength software 
system. Secondly we analyzed a small system, cpp2famix, 
that is developed using a strict unit testing approach. This 
allows us to confront test suites resulting of two testing 
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(a) Well-designed 
Case 



Test (b) Test Case with lack of 
explicit fixture 





(c) Large Fixture and Com- (d) Integration Test 
plex Test Scenario 

Figure 5. Test Design Indicators 

strategies. For each case study, we undertake the three iden- 
tified program understanding tasks and formulate our find- 
ings based upon what we derive from the three views. Next, 
we vaUdate these findings by skimming through the docu- 
mentation, by looking at external references or, in the case 
of CPP2famix, by interviewing the developer. 

5.1 Apache Ant 



o.a.t.ant.types.{AbstractFileSet,Path} as key classes 
through their extensive test coverage. 

Understand Units. In the 1.6.4 release of Ant, sev- 
eral bugs where found in the directory scanner as well 
as the unzip and untar feature^ Therefore, we analyze 
the existing testing facilities for these units. The con- 
cept of a directory scanner is implemented in the class 
o.a.t.ant.Directory Scanner. Using the Unit under Test view 
in FigurejTj we note that four test cases exercise this produc- 
tion class: one test case is exercising eight out of twenty- 
two production methods, the other three invoke just two 
methods. Therefore, we derive that the actual unit test for 
this unit is o.a.t.ant.Directory ScannerTest, while the three 
other test cases are only using the directory scanner as a 
helper unit. Indeed, these test cases only invoke so called 
getter methods, fetching data from the DirectoryScanner 
object to evaluate the expected result for their unit under 
test against the actual test result. 



D i recto ryScannerJest 



DirectoryScanner 
DefaultExcludesTest 




•DependTes^^ 



ClassFil 



eSetTest 



As a first case study we use the well-known Apache 
Ant project. The 1.6.5 release consists of about 104 
kSLOC, 18 kSLOC (17%) of which is JUnit test code. 

5.1.1 Findings 

First Contact. Figure [6] gives an overview of the system- 
wide view for test suites, based on part of the system 
core (due to scalability constraints on paper, we filtered 
out entities and relations other than the core packages 
o.a.J^ant and o.a.t.ant.taskdefs). We observe the presence 
of a considerable amount of test cases, residing in the same 
packages as production code. For packages outside of the 
core, the testing strategy seems different: some components 
seem not tested at all (e.g. o.a.t.ant.helper), others are 
tested in isolation (e.g. o.a.t.zip). Although located among 
production classes, we notice that Ant's test cases do 
not cover the units in the same package. Combined with 
the fact that many tests are covering o.a.t.ant.Project 
and depend upon o.a.t.ant.tools.BuildFileTest, we assume 
an indirect testing approach. Next to o.a.t.ant.Project, 
we identify o.a.t.ant.util.{FileUtils,JavaEnvUtils} and 

^abbreviation for org.apache.tools 



Figure 7. Unit o.a.t.DirectoryScanner. 

The untar functionality is implemented in 
o.a.t.ant.taskdefs. Untar, which shows as untested. How- 
ever, we do know, from the System- Wide View, that for the 
Ant project test cases are located in the same package as 
the production classes. We identify UntarTest as the actual 
test case based on naming. 

Assess Test Cases. o.a.t.ant.DirectoryScannerTest is ob- 
served to be a test command-rich test case. Most of these 
test commands appear to be similar in composition, target- 
ing the same side objects of Project and FileUtils. Un- 
tarTest is characterized by indirect testing behaviour by 
test cases relaying via BuildFileTest, Project and FileUtils. 
Neither of the two test cases has an explicitly defined fix- 
ture. Summarizing, both test cases make use of key system 
classes to exercise the unit under test. 

5.1.2 Validation 

Using the code coverage tool Emm.^ we compute a 
method coverage of 65% (80% class coverage) for Ant, 

^mentioned in the release notes of version 1.6.5 
^ http : // emma . sourcef orge .net/ 
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org.apache.tools.zip org. apache. tools. ant. P roject 




Figure 6. Ant System-wide Test View 



confirming our initial impression of a reasonably tested sys- 
tem. From Ant's documentatiorQ we derive that the key 
classes in the design as identified by its architects are a.o. 
Project, Task and Target. The documentation header of 
o.a.t.ant. Project describes this class as the central represen- 
tation of an Ant project. As this class also provides the 
means to start a build, its frequent usage by test cases con- 
firms the indirect testing approach in which several project 
scenarios are constructed, executed and verified using this 
generic Project class. The focus of Ant on Java software 
and the fact that build instructions are specified in XML files 
explains the importance of the other classes we noted. By 
looking into one of the test cases making use of the FileU- 
tils class, we were able to find the XML files containing 
test data in the distribution. Thus, Ant's documentation 
confirms our assumption that production classes invoked by 
many test cases play a major role in the system itself. 

Our claim that o.a.t. ant. tools. BuildFileTest is an impor- 
tant test class gets backed up by Van Geet and Zaidman, 
who identify it as an abstraction of a unit test that uses a 
build file as test data |[25l . For each test run, a Project in- 
stance is created which loads this XML file and executes the 
contained build instructions. This class thus indeed serves 
as a test helper , used by no less than 414 test commands. 

The Ant case reveals a major limitation of our visual- 
ization technique: when dealing with indirect tests, we fail 
to trace coverage relationships between test cases and cor- 
responding unit under test. We argue however that in a first 
contact phase, where overall system comprehension as well 
as identification of components worth investigating further 
are the prime objectives, information such as actual, com- 
plete coverage measurements requiring dynamic analysis 
are too expensive and less suited for visualization. Given 



^http://www.codefeed.com/tutorial/ant_config.html 



the underlying test model, graph querying is also a viable 
alternative to reveal indirect testing relations. 

5.2 CPP2FAMIX 

As a second case study we opt for the cpp2famix, 
a C++ fact extractor of about 13.5 kSLOC Java code. 
CPp2famix extracts information about a C++ software sys- 
tem out of the AST unit dumps of the GNU Compiler 
Collection. This information is transformed into a re- 
engineering model. The JUnit test suite accounts for 29% 
of the overall system size. We chose this system because the 
developer is a colleague of ours, hence we can thoroughly 
interview him. 



5.2.1 Findings 

First Contact. From Figure [8] we derive a consistent, 
per production class unit testing approach. The unit 
test code resides in a subpackage test of each compo- 
nent, except for four components that are weakly or 
even completely uncovered. Furthermore, we identify 
cpp2famix.test.TestGCCTreeDumpParser as an integra- 
tion test, exercising the parsing and filtering of a GCC 
tree dump into the system's internal tree representa- 
tion. For test dependencies, we noticed test helpers 
cpplfamix. node, traversal. test.NodeTraversalTest and 
cpp2famix.test.TestWithTreeFragment, helping test cases 
with traversing AST representations and with composing 
small test data trees respectively. 

Understand Units. The developer points out six pro- 
duction classes that are currently being modified: Class Ex- 
tractor, FieldExtractor, Fieldslterator, Attribute, Clazz and 
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Statementlterator. Using the Unit under Test Views, we 
identify and navigate to the test cases involved: 

• Attribute and Clazz are not directly tested as they be- 
long to generated code, conform to our earlier findings 
using the System- Wide view. 

• Class Extractor, Fieldlterator and Statementlterator 
are exercised by corresponding ^Test test cases. 

• FieldExtractor is covered by FieldsIteratorTest as well 
as by FieldExtractorTest. 



"FieldsIteratorTest 




Figure 9. Test case FieldsIteratorTest 



Assess Test Cases. From the Test Case View, we deduce 
that quite some helper objects are needed to test the behav- 
ior of the Extractor and Iterator classes. The class names of 
the test helpers, however, reveals that sample pieces of data 
are composed to exercise the units under test (Figure [9]). As 
such we conclude that these are broadly isolated unit tests. 

5.2.2 Validation 

During an interview, we confronted the system's devel- 
oper with our analysis. He testifies that a tight unit test- 
ing approach (using JUnit) has been undertaken, with test 
cases being written either just before (test-driven) or just 
after the corresponding production code. This results in a 
class and method coverage of 90% and 79% respectively. 
The developer acknowledges the presence of untested 
components, explaining that he did not see the need to 
test the generated components cpplfamix.metamodel.famix 
and cpp2famix.metamodel.moose. For the classes in 
cpp2famix.metamodel, he commented that we were look- 
ing at dead code that was replaced by the classes in the two 
subpackages. cpplfamix. extractors.^, at last, only contains 
simple data holders and as such it was not considered worth- 
while to be unit tested. 

cpplfamix.test.TestGCCTreeDumpParser is confirmed 
to be an integration test, but the developer stated that it 
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is an old test that even failed when we tried to execute it. 
Instead, he points to cpp2famix.test.FAMIXExtractorTest as 
being the current integration test, although it is not imple- 
mented in a traditional S-S-V-T style, but merely a "user" of 
the top level production class. That explains why we did not 
identify this test case as one that deserves special attention. 

6 Related Work 

We identified the following work in the domain of test 
suite reverse engineering. 

Agrawal et al. introduce a set of techniques to enhance 
program understanding, debugging and testing L2J. Among 
others, the xSuds tool suite contains tools to assist develop- 
ers in achieving high test coverage, locating errors as well 
as minimizing regression sets. Via source code coloring, the 
developer perceives the coverage level, erroneous locations 
or execution frequency. 

Gaelli et al. observe that not all unit tests are alike |[T4l . 
Therefore, a taxonomy that distinguishes unit tests based 
on the focus on one or more methods, type of expected 
outcome, etc. Their automated classification approach for 
SUnit tests using heuristics achieves a high overall preci- 
sion (89%) and a moderate recall (52%). One of the steps 
the authors identify as future work involves making explicit 
the relationship between unit tests and methods under test. 

Van Geet and Zaidman hypothesize that unit tests cover- 
ing multiple units are less suited as documentation as such 
tests are harder to understand | 25 1. In a case study involving 
the Ant project, the median number of methods executed 
by a test command is more than 200, which make them con- 
clude that the test suite of this particular project is not well 
suited for documentation purposes. 

To gain knowledge about the inner working of a software 
system, Cornelissen et al. use sequence diagrams obtained 
from test execution |7|. The use of abstraction, separation 
of test stages and stack depth limitations make such dia- 
grams scalable. 

7 Conclusion & Future work 

In this work, we proposed a visualization technique as- 
sisting re-engineers to explore the composition of an object- 
oriented system's unit test suite. We propose three graph- 
based views, representing (aspects of) a test suite in terms 
of the S-S-V-T cycle's test concepts. These views assist a 
re-engineer in building initial understanding and assessing 
opportunities and weaknesses for further evolution. We de- 
scribe how certain visual indicators in these views reveal 
information about the location of test cases, the coverage 
level (in an exploration context) as well as the followed unit 
test strategy. 



We validated the technique by means of two case stud- 
ies. In the first one, we compared the results of our analysis 
with Ant's system documentation as well as with finding 
of other authors. Our initial findings regarding coverage, 
key system classes as well as test design were confirmed. 
Secondly, we investigated a system which has been devel- 
oped with a tight unit testing approach. The lead developer 
of this system, CPP2FAMIX, confirmed most of our claims 
about the test suite. 

Based on these two case studies, we conclude that the 
visual exploration technique, as a first contact technique, 
serves its purpose. As a next step in the reverse engineering 
of test suites, we identify a need for finer-grained analysis, 
such as (i) obtaining actual coverage measurements via test 
execution and (ii) incorporating information about size and 
complexity of components for a more detailed assessment. 
We identify the integration of such information, e.g., via 
poly metric views, as future work. 
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